Boosting versus Covering

نویسندگان

  • Kohei Hatano
  • Manfred K. Warmuth
چکیده

We investigate improvements of AdaBoost that can exploit the fact that the weak hypotheses are one-sided, i.e. either all its positive (or negative) predictions are correct. In particular, for any set of m labeled examples consistent with a disjunction of k literals (which are one-sided in this case), AdaBoost constructs a consistent hypothesis by using O(k logm) iterations. On the other hand, a greedy set covering algorithm finds a consistent hypothesis of size O(k logm). Our primary question is whether there is a simple boosting algorithm that performs as well as the greedy set covering. We first show that InfoBoost, a modification of AdaBoost proposed by Aslam for a different purpose, does perform as well as the greedy set covering algorithm. We then show that AdaBoost requires Ω(k logm) iterations for learning k-literal disjunctions. We achieve this with an adversary construction and as well as in simple experiments based on artificial data. Further we give a variant called SemiBoost that can handle the degenerate case when the given examples all have the same label. We conclude by showing that SemiBoost can be used to produce small conjunctions as well.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Boosting Algorithm for Label Covering in Multilabel Problems

We describe, analyze and experiment with a boosting algorithm for multilabel categorization problems. Our algorithm includes as special cases previously studied boosting algorithms such as Adaboost.MH. We cast the multilabel problem as multiple binary decision problems, based on a user-defined covering of the set of labels. We prove a lower bound on the progress made by our algorithm on each bo...

متن کامل

Beyond Sequential Covering - Boosted Decision Rules

From the beginning of machine learning, rule induction has been regarded as one of the most important issues in this research area. One of the first rule induction algorithms was AQ introduced by Michalski in early 80’s. AQ, as well as several other well-known algorithms, such as CN2 and Ripper, are all based on sequential covering. With the advancement of machine learning, some new techniques ...

متن کامل

A modified NSGA-II solution for a new multi-objective hub maximal covering problem under uncertain shipments

Hubs are centers for collection, rearrangement, and redistribution of commodities in transportation networks. In this paper, non-linear multi-objective formulations for single and multiple allocation hub maximal covering problems as well as the linearized versions are proposed. The formulations substantially mitigate complexity of the existing models due to the fewer number of constraints and v...

متن کامل

Boosted Regression (Boosting): An introductory tutorial and a Stata plugin

Boosting, or boosted regression, is a recent data mining technique that has shown considerable success in predictive accuracy. This article gives an overview over boosting and introduces a new Stata command, boost, that implements the boosting algorithm described in Hastie et al. (2001, p. 322). The plugin is illustrated with a Gaussian and a logistic regression example. In the Gaussian regress...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003